Intro from Skeleton Code (with some edits to add more features to subset):

setwd('/Users/robin/Desktop/Assignment\ 1\ MSDS\ 410')
mydata <- read.csv(file="ames_housing_data.csv",head=TRUE,sep=",")

str(mydata)
## 'data.frame':    2930 obs. of  82 variables:
##  $ SID          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ PID          : int  526301100 526350040 526351010 526353030 527105010 527105030 527127150 527145080 527146030 527162130 ...
##  $ SubClass     : int  20 20 20 20 60 60 120 120 120 60 ...
##  $ Zoning       : chr  "RL" "RH" "RL" "RL" ...
##  $ LotFrontage  : int  141 80 81 93 74 78 41 43 39 60 ...
##  $ LotArea      : int  31770 11622 14267 11160 13830 9978 4920 5005 5389 7500 ...
##  $ Street       : chr  "Pave" "Pave" "Pave" "Pave" ...
##  $ Alley        : chr  NA NA NA NA ...
##  $ LotShape     : chr  "IR1" "Reg" "IR1" "Reg" ...
##  $ LandContour  : chr  "Lvl" "Lvl" "Lvl" "Lvl" ...
##  $ Utilities    : chr  "AllPub" "AllPub" "AllPub" "AllPub" ...
##  $ LotConfig    : chr  "Corner" "Inside" "Corner" "Corner" ...
##  $ LandSlope    : chr  "Gtl" "Gtl" "Gtl" "Gtl" ...
##  $ Neighborhood : chr  "NAmes" "NAmes" "NAmes" "NAmes" ...
##  $ Condition1   : chr  "Norm" "Feedr" "Norm" "Norm" ...
##  $ Condition2   : chr  "Norm" "Norm" "Norm" "Norm" ...
##  $ BldgType     : chr  "1Fam" "1Fam" "1Fam" "1Fam" ...
##  $ HouseStyle   : chr  "1Story" "1Story" "1Story" "1Story" ...
##  $ OverallQual  : int  6 5 6 7 5 6 8 8 8 7 ...
##  $ OverallCond  : int  5 6 6 5 5 6 5 5 5 5 ...
##  $ YearBuilt    : int  1960 1961 1958 1968 1997 1998 2001 1992 1995 1999 ...
##  $ YearRemodel  : int  1960 1961 1958 1968 1998 1998 2001 1992 1996 1999 ...
##  $ RoofStyle    : chr  "Hip" "Gable" "Hip" "Hip" ...
##  $ RoofMat      : chr  "CompShg" "CompShg" "CompShg" "CompShg" ...
##  $ Exterior1    : chr  "BrkFace" "VinylSd" "Wd Sdng" "BrkFace" ...
##  $ Exterior2    : chr  "Plywood" "VinylSd" "Wd Sdng" "BrkFace" ...
##  $ MasVnrType   : chr  "Stone" "None" "BrkFace" "None" ...
##  $ MasVnrArea   : int  112 0 108 0 0 20 0 0 0 0 ...
##  $ ExterQual    : chr  "TA" "TA" "TA" "Gd" ...
##  $ ExterCond    : chr  "TA" "TA" "TA" "TA" ...
##  $ Foundation   : chr  "CBlock" "CBlock" "CBlock" "CBlock" ...
##  $ BsmtQual     : chr  "TA" "TA" "TA" "TA" ...
##  $ BsmtCond     : chr  "Gd" "TA" "TA" "TA" ...
##  $ BsmtExposure : chr  "Gd" "No" "No" "No" ...
##  $ BsmtFinType1 : chr  "BLQ" "Rec" "ALQ" "ALQ" ...
##  $ BsmtFinSF1   : int  639 468 923 1065 791 602 616 263 1180 0 ...
##  $ BsmtFinType2 : chr  "Unf" "LwQ" "Unf" "Unf" ...
##  $ BsmtFinSF2   : int  0 144 0 0 0 0 0 0 0 0 ...
##  $ BsmtUnfSF    : int  441 270 406 1045 137 324 722 1017 415 994 ...
##  $ TotalBsmtSF  : int  1080 882 1329 2110 928 926 1338 1280 1595 994 ...
##  $ Heating      : chr  "GasA" "GasA" "GasA" "GasA" ...
##  $ HeatingQC    : chr  "Fa" "TA" "TA" "Ex" ...
##  $ CentralAir   : chr  "Y" "Y" "Y" "Y" ...
##  $ Electrical   : chr  "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
##  $ FirstFlrSF   : int  1656 896 1329 2110 928 926 1338 1280 1616 1028 ...
##  $ SecondFlrSF  : int  0 0 0 0 701 678 0 0 0 776 ...
##  $ LowQualFinSF : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ GrLivArea    : int  1656 896 1329 2110 1629 1604 1338 1280 1616 1804 ...
##  $ BsmtFullBath : int  1 0 0 1 0 0 1 0 1 0 ...
##  $ BsmtHalfBath : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ FullBath     : int  1 1 1 2 2 2 2 2 2 2 ...
##  $ HalfBath     : int  0 0 1 1 1 1 0 0 0 1 ...
##  $ BedroomAbvGr : int  3 2 3 3 3 3 2 2 2 3 ...
##  $ KitchenAbvGr : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ KitchenQual  : chr  "TA" "TA" "Gd" "Ex" ...
##  $ TotRmsAbvGrd : int  7 5 6 8 6 7 6 5 5 7 ...
##  $ Functional   : chr  "Typ" "Typ" "Typ" "Typ" ...
##  $ Fireplaces   : int  2 0 0 2 1 1 0 0 1 1 ...
##  $ FireplaceQu  : chr  "Gd" NA NA "TA" ...
##  $ GarageType   : chr  "Attchd" "Attchd" "Attchd" "Attchd" ...
##  $ GarageYrBlt  : int  1960 1961 1958 1968 1997 1998 2001 1992 1995 1999 ...
##  $ GarageFinish : chr  "Fin" "Unf" "Unf" "Fin" ...
##  $ GarageCars   : int  2 1 1 2 2 2 2 2 2 2 ...
##  $ GarageArea   : int  528 730 312 522 482 470 582 506 608 442 ...
##  $ GarageQual   : chr  "TA" "TA" "TA" "TA" ...
##  $ GarageCond   : chr  "TA" "TA" "TA" "TA" ...
##  $ PavedDrive   : chr  "P" "Y" "Y" "Y" ...
##  $ WoodDeckSF   : int  210 140 393 0 212 360 0 0 237 140 ...
##  $ OpenPorchSF  : int  62 0 36 0 34 36 0 82 152 60 ...
##  $ EnclosedPorch: int  0 0 0 0 0 0 170 0 0 0 ...
##  $ ThreeSsnPorch: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ScreenPorch  : int  0 120 0 0 0 0 0 144 0 0 ...
##  $ PoolArea     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolQC       : chr  NA NA NA NA ...
##  $ Fence        : chr  NA "MnPrv" NA NA ...
##  $ MiscFeature  : chr  NA NA "Gar2" NA ...
##  $ MiscVal      : int  0 0 12500 0 0 0 0 0 0 0 ...
##  $ MoSold       : int  5 6 6 4 3 6 4 1 3 6 ...
##  $ YrSold       : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ SaleType     : chr  "WD " "WD " "WD " "WD " ...
##  $ SaleCondition: chr  "Normal" "Normal" "Normal" "Normal" ...
##  $ SalePrice    : int  215000 105000 172000 244000 189900 195500 213500 191500 236500 189000 ...
head(mydata)
##   SID       PID SubClass Zoning LotFrontage LotArea Street Alley LotShape
## 1   1 526301100       20     RL         141   31770   Pave  <NA>      IR1
## 2   2 526350040       20     RH          80   11622   Pave  <NA>      Reg
## 3   3 526351010       20     RL          81   14267   Pave  <NA>      IR1
## 4   4 526353030       20     RL          93   11160   Pave  <NA>      Reg
## 5   5 527105010       60     RL          74   13830   Pave  <NA>      IR1
## 6   6 527105030       60     RL          78    9978   Pave  <NA>      IR1
##   LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2
## 1         Lvl    AllPub    Corner       Gtl        NAmes       Norm       Norm
## 2         Lvl    AllPub    Inside       Gtl        NAmes      Feedr       Norm
## 3         Lvl    AllPub    Corner       Gtl        NAmes       Norm       Norm
## 4         Lvl    AllPub    Corner       Gtl        NAmes       Norm       Norm
## 5         Lvl    AllPub    Inside       Gtl      Gilbert       Norm       Norm
## 6         Lvl    AllPub    Inside       Gtl      Gilbert       Norm       Norm
##   BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodel RoofStyle
## 1     1Fam     1Story           6           5      1960        1960       Hip
## 2     1Fam     1Story           5           6      1961        1961     Gable
## 3     1Fam     1Story           6           6      1958        1958       Hip
## 4     1Fam     1Story           7           5      1968        1968       Hip
## 5     1Fam     2Story           5           5      1997        1998     Gable
## 6     1Fam     2Story           6           6      1998        1998     Gable
##   RoofMat Exterior1 Exterior2 MasVnrType MasVnrArea ExterQual ExterCond
## 1 CompShg   BrkFace   Plywood      Stone        112        TA        TA
## 2 CompShg   VinylSd   VinylSd       None          0        TA        TA
## 3 CompShg   Wd Sdng   Wd Sdng    BrkFace        108        TA        TA
## 4 CompShg   BrkFace   BrkFace       None          0        Gd        TA
## 5 CompShg   VinylSd   VinylSd       None          0        TA        TA
## 6 CompShg   VinylSd   VinylSd    BrkFace         20        TA        TA
##   Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1
## 1     CBlock       TA       Gd           Gd          BLQ        639
## 2     CBlock       TA       TA           No          Rec        468
## 3     CBlock       TA       TA           No          ALQ        923
## 4     CBlock       TA       TA           No          ALQ       1065
## 5      PConc       Gd       TA           No          GLQ        791
## 6      PConc       TA       TA           No          GLQ        602
##   BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir
## 1          Unf          0       441        1080    GasA        Fa          Y
## 2          LwQ        144       270         882    GasA        TA          Y
## 3          Unf          0       406        1329    GasA        TA          Y
## 4          Unf          0      1045        2110    GasA        Ex          Y
## 5          Unf          0       137         928    GasA        Gd          Y
## 6          Unf          0       324         926    GasA        Ex          Y
##   Electrical FirstFlrSF SecondFlrSF LowQualFinSF GrLivArea BsmtFullBath
## 1      SBrkr       1656           0            0      1656            1
## 2      SBrkr        896           0            0       896            0
## 3      SBrkr       1329           0            0      1329            0
## 4      SBrkr       2110           0            0      2110            1
## 5      SBrkr        928         701            0      1629            0
## 6      SBrkr        926         678            0      1604            0
##   BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual
## 1            0        1        0            3            1          TA
## 2            0        1        0            2            1          TA
## 3            0        1        1            3            1          Gd
## 4            0        2        1            3            1          Ex
## 5            0        2        1            3            1          TA
## 6            0        2        1            3            1          Gd
##   TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt
## 1            7        Typ          2          Gd     Attchd        1960
## 2            5        Typ          0        <NA>     Attchd        1961
## 3            6        Typ          0        <NA>     Attchd        1958
## 4            8        Typ          2          TA     Attchd        1968
## 5            6        Typ          1          TA     Attchd        1997
## 6            7        Typ          1          Gd     Attchd        1998
##   GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive
## 1          Fin          2        528         TA         TA          P
## 2          Unf          1        730         TA         TA          Y
## 3          Unf          1        312         TA         TA          Y
## 4          Fin          2        522         TA         TA          Y
## 5          Fin          2        482         TA         TA          Y
## 6          Fin          2        470         TA         TA          Y
##   WoodDeckSF OpenPorchSF EnclosedPorch ThreeSsnPorch ScreenPorch PoolArea
## 1        210          62             0             0           0        0
## 2        140           0             0             0         120        0
## 3        393          36             0             0           0        0
## 4          0           0             0             0           0        0
## 5        212          34             0             0           0        0
## 6        360          36             0             0           0        0
##   PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
## 1   <NA>  <NA>        <NA>       0      5   2010      WD         Normal
## 2   <NA> MnPrv        <NA>       0      6   2010      WD         Normal
## 3   <NA>  <NA>        Gar2   12500      6   2010      WD         Normal
## 4   <NA>  <NA>        <NA>       0      4   2010      WD         Normal
## 5   <NA> MnPrv        <NA>       0      3   2010      WD         Normal
## 6   <NA>  <NA>        <NA>       0      6   2010      WD         Normal
##   SalePrice
## 1    215000
## 2    105000
## 3    172000
## 4    244000
## 5    189900
## 6    195500
names(mydata)
##  [1] "SID"           "PID"           "SubClass"      "Zoning"       
##  [5] "LotFrontage"   "LotArea"       "Street"        "Alley"        
##  [9] "LotShape"      "LandContour"   "Utilities"     "LotConfig"    
## [13] "LandSlope"     "Neighborhood"  "Condition1"    "Condition2"   
## [17] "BldgType"      "HouseStyle"    "OverallQual"   "OverallCond"  
## [21] "YearBuilt"     "YearRemodel"   "RoofStyle"     "RoofMat"      
## [25] "Exterior1"     "Exterior2"     "MasVnrType"    "MasVnrArea"   
## [29] "ExterQual"     "ExterCond"     "Foundation"    "BsmtQual"     
## [33] "BsmtCond"      "BsmtExposure"  "BsmtFinType1"  "BsmtFinSF1"   
## [37] "BsmtFinType2"  "BsmtFinSF2"    "BsmtUnfSF"     "TotalBsmtSF"  
## [41] "Heating"       "HeatingQC"     "CentralAir"    "Electrical"   
## [45] "FirstFlrSF"    "SecondFlrSF"   "LowQualFinSF"  "GrLivArea"    
## [49] "BsmtFullBath"  "BsmtHalfBath"  "FullBath"      "HalfBath"     
## [53] "BedroomAbvGr"  "KitchenAbvGr"  "KitchenQual"   "TotRmsAbvGrd" 
## [57] "Functional"    "Fireplaces"    "FireplaceQu"   "GarageType"   
## [61] "GarageYrBlt"   "GarageFinish"  "GarageCars"    "GarageArea"   
## [65] "GarageQual"    "GarageCond"    "PavedDrive"    "WoodDeckSF"   
## [69] "OpenPorchSF"   "EnclosedPorch" "ThreeSsnPorch" "ScreenPorch"  
## [73] "PoolArea"      "PoolQC"        "Fence"         "MiscFeature"  
## [77] "MiscVal"       "MoSold"        "YrSold"        "SaleType"     
## [81] "SaleCondition" "SalePrice"
mydata$TotalFloorSF <- mydata$FirstFlrSF + mydata$SecondFlrSF
mydata$HouseAge <- mydata$YrSold - mydata$YearBuilt
mydata$QualityIndex <- mydata$OverallQual * mydata$OverallCond
mydata$logSalePrice <- log(mydata$SalePrice)
mydata$price_sqft <- mydata$SalePrice/mydata$TotalFloorSF
summary(mydata$price_sqft)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.37  100.57  120.43  121.60  140.01  276.25
hist(mydata$price_sqft)

subdat <- subset(mydata, select=c("TotalFloorSF","HouseAge","QualityIndex",
                                  "price_sqft", "SalePrice","LotArea",
                                  "BsmtFinSF1","Neighborhood","HouseStyle",
                                  "LotShape","OverallQual","logSalePrice",
                                  "TotalBsmtSF","HouseStyle","Zoning","LotShape","SaleCondition","Functional", "LotArea","SubClass","LotFrontage","OverallCond", "YearBuilt", "ExterQual", "ExterCond", "FirstFlrSF", "SecondFlrSF", "BedroomAbvGr", "TotRmsAbvGrd", "GrLivArea", "MiscVal", "YearRemodel"))

str(subdat)
## 'data.frame':    2930 obs. of  32 variables:
##  $ TotalFloorSF : int  1656 896 1329 2110 1629 1604 1338 1280 1616 1804 ...
##  $ HouseAge     : int  50 49 52 42 13 12 9 18 15 11 ...
##  $ QualityIndex : int  30 30 36 35 25 36 40 40 40 35 ...
##  $ price_sqft   : num  130 117 129 116 117 ...
##  $ SalePrice    : int  215000 105000 172000 244000 189900 195500 213500 191500 236500 189000 ...
##  $ LotArea      : int  31770 11622 14267 11160 13830 9978 4920 5005 5389 7500 ...
##  $ BsmtFinSF1   : int  639 468 923 1065 791 602 616 263 1180 0 ...
##  $ Neighborhood : chr  "NAmes" "NAmes" "NAmes" "NAmes" ...
##  $ HouseStyle   : chr  "1Story" "1Story" "1Story" "1Story" ...
##  $ LotShape     : chr  "IR1" "Reg" "IR1" "Reg" ...
##  $ OverallQual  : int  6 5 6 7 5 6 8 8 8 7 ...
##  $ logSalePrice : num  12.3 11.6 12.1 12.4 12.2 ...
##  $ TotalBsmtSF  : int  1080 882 1329 2110 928 926 1338 1280 1595 994 ...
##  $ HouseStyle.1 : chr  "1Story" "1Story" "1Story" "1Story" ...
##  $ Zoning       : chr  "RL" "RH" "RL" "RL" ...
##  $ LotShape.1   : chr  "IR1" "Reg" "IR1" "Reg" ...
##  $ SaleCondition: chr  "Normal" "Normal" "Normal" "Normal" ...
##  $ Functional   : chr  "Typ" "Typ" "Typ" "Typ" ...
##  $ LotArea.1    : int  31770 11622 14267 11160 13830 9978 4920 5005 5389 7500 ...
##  $ SubClass     : int  20 20 20 20 60 60 120 120 120 60 ...
##  $ LotFrontage  : int  141 80 81 93 74 78 41 43 39 60 ...
##  $ OverallCond  : int  5 6 6 5 5 6 5 5 5 5 ...
##  $ YearBuilt    : int  1960 1961 1958 1968 1997 1998 2001 1992 1995 1999 ...
##  $ ExterQual    : chr  "TA" "TA" "TA" "Gd" ...
##  $ ExterCond    : chr  "TA" "TA" "TA" "TA" ...
##  $ FirstFlrSF   : int  1656 896 1329 2110 928 926 1338 1280 1616 1028 ...
##  $ SecondFlrSF  : int  0 0 0 0 701 678 0 0 0 776 ...
##  $ BedroomAbvGr : int  3 2 3 3 3 3 2 2 2 3 ...
##  $ TotRmsAbvGrd : int  7 5 6 8 6 7 6 5 5 7 ...
##  $ GrLivArea    : int  1656 896 1329 2110 1629 1604 1338 1280 1616 1804 ...
##  $ MiscVal      : int  0 0 12500 0 0 0 0 0 0 0 ...
##  $ YearRemodel  : int  1960 1961 1958 1968 1998 1998 2001 1992 1996 1999 ...
subdatnum <- subset(mydata, select=c("TotalFloorSF","HouseAge","QualityIndex",
                                     "SalePrice","LotArea","OverallQual","logSalePrice"))

Section 1: Sample Definition • Remove houses with square footage above 4000 because of the scarceness of the data after that point may not allow for the best analysis of data. • Remove houses with quality index below 5 and quality index below 2, as these houses are irregular in status after analyzing the range of values both features could take (with a quality index high of 35 and an overall quality high of 10). These houses seem to be extremely poor in quality and can hence cause a lot of variability in analysis. We should focus on houses with more moderate levels of quality and condition. • Dwellings with MS Zoning classified as Commercial and Industrial should also be eliminated from the analysis and regression to avoid confusion with houses in residential areas like the rest. Commercial and industrial properties are significantly different than residential properties, resulting in a different pricing structure, which may be a point or error in conducting analysis. • Lot shape IR3 also only have a few amounts of examples in this dataset, and it may not be enough to do a proper analysis on, and hence should be dropped. • Sale conditions that are abnormal, Alloca, and Partial should not be considered as the sale prices may vary due to these circumstances, which may cause errors during analysis.
• Houses without typical Functionality should be removed. • Lot area above 100,000 is a lot higher than the mean of lot value, and anything above that value seem to be outliers. Hence, houses with lot sizes should be removed, primarily because of the scarcity of data available above that lot size. • We should also remove entries with N/A in the values being analyzed, so that the analysis may take the values it has into account, and it can prevent errors in the analysis down the line.

Waterfall included Section 2: Data Quality Check • 20 Analyzed features:

Total Floor SF
House Style
Lotshape
House Age MS SubClass
Lot Area Lot Frontage
Overall Qual Overall Cond Year built Exterior Quality Exterior Condition Misc Val 1st Flr SF
2nd Flr SF BedroomAbvGr TotRmsAbvGrd Foundation GrLivArea YearRemodel

• After viewing a histogram of TotalFloorSF, I can see there is a scarce amount of data after about 4000 square feet, so those above that value should be removed. This also has a significant impact on sale price, in my opinion, because house price tends to increase as house size does. • The house style is in correlation with house size, so it may impact the price in a significant way as well. • Lotshape also is related to overall property size. The more property, or higher the lot size, the higher the price should be, holding all else constant. • House age could determine house quality, as newer houses may have more features and may last longer, as well as be in less need of repairs. The better the condition of the house, the higher its price should be. This needed to be edited due to the lack of abundant data for ‘IR3.’ It would be better to eliminate houses with that entry due to the lack of data. • Subclass is also related to house style and house size, as 2 story houses are bigger and have more living area than a 1 story house, and hence could be more expensive holding other variables constant. This feature holds a lot of weight, so should be analyzed in relation to the other features. • Lot area is another feature that contributes to the size of the property. • Lot frontage also contributes to size via lot area, and seems to be a heavy weighted feature. It can also determine if the property is in a more isolated area or is connected to the street. Properties in the two different areas could vary in price significantly. • Overall Conditions and Overall Quality both go hand in hand, and state how well the property is. The higher the value the higher the price typically is, hence making it a high weighted feature. • Year built is related to the age, which may distinguish quality. SOme individuals may also prefer older homes due to material choice, which could contribute heavily to hous price. • Exterior quality and conditions also contribute to how much repair the house needs. If the house is in need of more repairs and both values are low, their prices may be lower as well, making it a heavy weighted feature for analysis. • Miscellaneous Feature Value also adds to property value significantly. Houses with a pool, shed, basketball court, etc. may be a more desirable property and could hence increase the price. • 1st floor square feet and 2nd floor square feet contribute to the size of the property. It could also distinguish the house subclass, making it a high weighted feature. • Bedrooms and total rooms above ground also can determine the size of the property and living space. It could also determine if larger or multiple families could live in the house. Distinguishing between the two could cause variances in price, making it worth including in analysis. • Foundation could determine the material the houses were made of and could help determine the longevity of the property. This needs to be made into numerical values, however. • GR Living Area contributes to size and how much space families have to live in the residence, contributing to price heavily in the end. • The year the property was remodeled could significantly impact the style, features, and durability/quality/condition of the house, making it a heavily weighted feature.

All of these features are what I thought would contribute the most to determining the sale price of the house and are heavily weighted features. Some needed to be edited because of lack of data surrounding the various entries, which may cause greater errors in predictions.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

#####################################################################
############# Data Quality Check ###########################mydata[!complete.cases(mydata),]
##################################################################
print('TotalFloorSF')
## [1] "TotalFloorSF"
hist(subdat7$TotalFloorSF)

summary(subdat7$TotalFloorSF)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     492    1092    1418    1464    1717    3820
quantile(subdat7$TotalFloorSF)
##     0%    25%    50%    75%   100% 
##  492.0 1092.0 1418.5 1717.0 3820.0
print('Subclass')
## [1] "Subclass"
hist(subdat7$SubClass)
summary(subdat7$SubClass)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20.00   20.00   50.00   58.14   70.00  190.00
quantile(subdat7$SubClass)
##   0%  25%  50%  75% 100% 
##   20   20   50   70  190
print('LotShape Plot')
## [1] "LotShape Plot"
require(ggplot2)
## Loading required package: ggplot2

ggplot(subdat7) +
  geom_bar( aes(LotShape) ) +
  ggtitle("Number of houses per Lotshape") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

print('House Age')
## [1] "House Age"
hist(subdat7$HouseAge)

summary(subdat7$HouseAge)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.00   34.50   36.95   53.00  136.00
quantile(subdat7$HouseAge)
##    0%   25%   50%   75%  100% 
##   0.0   9.0  34.5  53.0 136.0
print('Lot Area')
## [1] "Lot Area"
plot(subdat7$LotArea)

summary(subdat7$LotArea)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1300    7352    9306    9617   11250   70761
quantile(subdat7$LotArea)
##      0%     25%     50%     75%    100% 
##  1300.0  7352.5  9305.5 11250.0 70761.0
print('Lot Frontage')
## [1] "Lot Frontage"
hist(subdat7$LotFrontage)

summary(subdat7$LotFrontage)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   37.00   60.00   55.23   76.00  313.00
subdat7[is.na(subdat7)] = 0
quantile(subdat7$LotArea)
##      0%     25%     50%     75%    100% 
##  1300.0  7352.5  9305.5 11250.0 70761.0
print('House Style')
## [1] "House Style"
require(ggplot2)
ggplot(subdat7) +
  geom_bar( aes(HouseStyle) ) +
  ggtitle("Number of houses per style") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

print('Overall Qual')
## [1] "Overall Qual"
hist(as.numeric(subdat7$OverallQual))

summary(as.numeric(subdat7$OverallQual))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   6.089   7.000  10.000
quantile(as.numeric(subdat7$OverallQual))
##   0%  25%  50%  75% 100% 
##    3    5    6    7   10
mean(as.numeric(subdat7$OverallQual))
## [1] 6.089427
print("Overall Cond")
## [1] "Overall Cond"
hist(as.numeric(subdat7$OverallCond))

summary(as.numeric(subdat7$OverallCond))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   5.000   5.000   5.664   6.000   9.000
quantile(as.numeric(subdat7$OverallCond))
##   0%  25%  50%  75% 100% 
##    2    5    5    6    9
mean(subdat7$OverallCond)
## [1] 5.663877
print("YearBuilt")
## [1] "YearBuilt"
hist(subdat7$YearBuilt)

summary(subdat7$YearBuilt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1872    1955    1973    1971    1998    2010
quantile(subdat7$YearBuilt)
##   0%  25%  50%  75% 100% 
## 1872 1955 1973 1998 2010
print("ExterQual")
## [1] "ExterQual"
summary(mydata$ExterQual)
##    Length     Class      Mode 
##      2930 character character
print("ExterCond")
## [1] "ExterCond"
summary(mydata$ExterCond)
##    Length     Class      Mode 
##      2930 character character
print("FirstFlrSF")
## [1] "FirstFlrSF"
hist(subdat7$FirstFlrSF)

summary(subdat7$FirstFlrSF)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     442     864    1052    1125    1335    3820
quantile(subdat7$FirstFlrSF)
##      0%     25%     50%     75%    100% 
##  442.00  864.00 1052.00 1334.75 3820.00
print("SecondFlrSF")
## [1] "SecondFlrSF"
hist(subdat7$SecondFlrSF)

summary(subdat7$SecondFlrSF)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     0.0   339.3   707.0  1836.0
quantile(subdat7$SecondFlrSF)
##   0%  25%  50%  75% 100% 
##    0    0    0  707 1836
print("BedroomAbvGr")
## [1] "BedroomAbvGr"
hist(subdat7$BedroomAbvGr)

summary(subdat7$BedroomAbvGr)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   3.000   2.864   3.000   6.000
quantile(subdat7$BedroomAbvGr)
##   0%  25%  50%  75% 100% 
##    0    2    3    3    6
print("TotRmsAbvGrd")
## [1] "TotRmsAbvGrd"
hist(subdat7$TotRmsAbvGrd)

summary(subdat7$TotRmsAbvGrd)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   6.354   7.000  13.000
quantile(subdat7$TotRmsAbvGrd)
##   0%  25%  50%  75% 100% 
##    3    5    6    7   13
print("Foundation")
## [1] "Foundation"
summary(subdat7$Foundation)
## Length  Class   Mode 
##      0   NULL   NULL
print("miscVal")
## [1] "miscVal"
hist(subdat7$MiscVal)

summary(subdat7$MiscVal)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     0.0    48.5     0.0 15500.0
quantile(subdat7$MiscVal)
##    0%   25%   50%   75%  100% 
##     0     0     0     0 15500
print("GrLivArea")
## [1] "GrLivArea"
hist(subdat7$GrLivArea)

summary(subdat7$GrLivArea)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     492    1095    1422    1467    1718    3820
quantile(subdat7$GrLivArea)
##      0%     25%     50%     75%    100% 
##  492.00 1095.25 1422.00 1717.75 3820.00
print("YearRemodel")
## [1] "YearRemodel"
hist(subdat7$YearRemodel)

summary(subdat7$YearRemodel)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1950    1966    1992    1984    2002    2010
quantile(subdat7$YearRemodel)
##   0%  25%  50%  75% 100% 
## 1950 1966 1992 2002 2010

Section 3: Initial Exploratory Data Analysis • Out of the twenty features checked above, these ten seemed to be the more important features, as they were more numeric and easily analyzable. After graphing the variables, I could see the distribution of the variables. The ones with relatively normal distributions could be analyzed well, while some may need more preprocessing before it can be included in the model, such as Second floor square footage, as a lot of it is skewed to the right. This may be due to the abundance of homes that are one story, resulting in 0 values for 2nd floor square footage. Upon analysis this should be replaced, along with 1st floor square footage to just include all of the living area in the property, by including the feature ‘GrLivArea.’

#################################################################
################## univariate EDA ##############################
###############################################################
require(ggplot2)
ggplot(subdat7) +
  geom_bar( aes(LotShape) ) +
  ggtitle("Number of houses per Lotshape") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=SalePrice)) + 
  geom_histogram(color="black", binwidth= 10000) +
  labs(title="Distribution of Sale Price") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=TotalFloorSF)) + 
  geom_histogram(color="black", binwidth= 100) +
  labs(title="Distribution of TotalFloorSF") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=QualityIndex)) + 
  geom_histogram(color="black", binwidth= 10) +
  labs(title="Distribution of QualityIndex") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=LotArea)) + 
  geom_histogram(color="black", binwidth= 10) +
  labs(title="Distribution of LotArea") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=FirstFlrSF)) + 
  geom_histogram(color="black", binwidth= 10) +
  labs(title="Distribution of First Floor SF") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=SecondFlrSF)) + 
  geom_histogram(color="black", binwidth= 10) +
  labs(title="Distribution of Second FL SF") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=GrLivArea)) + 
  geom_histogram(color="black", binwidth= 10) +
  labs(title="Distribution of GR Liv Area") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=YearBuilt)) + 
  geom_histogram(color="black", binwidth= 10) +
  labs(title="Distribution of Year Built") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=YearRemodel)) + 
  geom_histogram(color="black", binwidth= 10) +
  labs(title="Distribution of YearRemodel") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

#######################################################################
########### bivariate EDA ########################################
###################################################################
ggplot(subdat7, aes(x=TotalFloorSF, y=QualityIndex)) + 
  geom_point(color="blue", shape=1) +
  ggtitle("Scatter Plot of Total Floor SF vs QualityIndex") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=TotalFloorSF, y=HouseAge)) + 
  geom_point(color="blue", shape=1) +
  ggtitle("Scatter Plot of Total Floor SF vs HouseAge") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=LotShape, y=HouseAge)) + 
  geom_boxplot(fill="blue") +
  labs(title="Distribution of HouseAge by Lotshape") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

Section 4: Exploratory Data Analysis for Modeling • I chose the three variables I thought would have the most linear relationship to sale price, as well as a categorical feature: TotalFloorSf, QualityIndex, and Lotshape (being the categorical). TotalFloorSF and QualityIndex seem to have quite the liner relationship to sale price, as they both seem to increase with each other for the most part. However, it is not extremely linear, showing the importance the other features may have on the sale price as well. Lotshape does not seem to have much of a relationship to sale price, as according to the boxplot, it is all over the place. However, this shows how many outliers are in this feature, and how it may need more preprocessing before being included in the model. This allows us to get an insight as to which features are heavily weighted and appear necessary for proper and more accurate analysis while which others need to be edited, omitted, or replaces in the model.

############################################################
################ model focussed EDA #######################
###########################################################

ggplot(subdat7, aes(x=TotalFloorSF, y=SalePrice)) + 
  geom_point(color="blue", size=2) +
  ggtitle("Scatter Plot of Sale Price vs Total Floor SF") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5)) +
  geom_smooth(method=lm, se=FALSE)  ## method=lm, se=FALSE ###
## `geom_smooth()` using formula 'y ~ x'

ggplot(subdat7, aes(x=QualityIndex, y=SalePrice)) + 
  geom_point(color="blue", shape=1) +
  ggtitle("Scatter Plot of Sale Price vs QualityIndex") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5)) 

ggplot(subdat7, aes(x=LotShape, y=SalePrice)) + 
  geom_boxplot(fill="blue") +
  labs(title="Distribution of Sale Price") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))

ggplot(subdat7, aes(x=YearRemodel, y=SalePrice)) + 
  geom_point(color="blue", size=2) +
  ggtitle("Scatter Plot of Sale Price vs Total Floor SF") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5)) +
  geom_smooth(method=lm, se=FALSE)  ## method=lm, se=FALSE ###
## `geom_smooth()` using formula 'y ~ x'

ggplot(subdat7, aes(x=YearBuilt, y=SalePrice)) + 
  geom_point(color="blue", shape=1) +
  ggtitle("Scatter Plot of Sale Price vs QualityIndex") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5)) 

ggplot(subdat7, aes(x=LotArea, y=SalePrice)) + 
  geom_boxplot(fill="blue") +
  labs(title="Distribution of Sale Price") +
  theme(plot.title=element_text(lineheight=0.8, face="bold", hjust=0.5))
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

#####################################################################
############# EDA for multiple variables ###########################
##################################################################
require(GGally)
## Loading required package: GGally
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
ggpairs(subdat7, cardinality_threshold=NULL)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

require(lattice)
## Loading required package: lattice
pairs(subdatnum, pch = 21)

require(corrplot)
## Loading required package: corrplot
## corrplot 0.84 loaded
mcor <- cor(subdatnum)
corrplot(mcor, method="shade", shade.col=NA, tl.col="black",tl.cex=0.5)

Section 5: Summary/Conclusions • This assignment allowed me to explore the data and clean it up to find features that may be the most beneficial to creating a model, or at least those that may have the greatest impact. It also allowed me to warm-up on my R and R Studio skills to get ready for the assignments to come, which may involve modeling and more statistical analysis. It prepared me to read the data and visualize which ones may be well implemented if it were to be plugged into a model, and which ones need editing, should be combined with others for better characterization of a feature, and omitted as they may not contribute to the model in a positive way. The various visual aspects conducted in this assignment helped further develop the skillet of analyzing data and exploring ways to create a more accurate model.